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ABSTRACT 

In  order  to  investigate  the  performance  of  mixed-  versus  homogeneous-culture  military  teams,  the  NATO  RTO 
Research  Task  Group,  HFM-138/RTG  on  Adaptability  in  Multinational  Coalitions  conducted  an  experiment 
using  a  complex,  but  very  absorbing  and  immersive,  computer-based  role-play  game  using  a  modern  urban 
search-for-contraband  scenario.  Game-play  required  planning,  resource  allocation,  situation  awareness, 
communication,  and  coordination  for  successful  performance.  This  paper  briefly  describes  the  experiment  and 
its  results  prior  to  discussing  the  lessons  learned  in  conducting  the  experiment.  It  focuses  on  practical 
methodological  and  logistical  implications  for  future  research  on  culture  and  teamwork  using  computer 
games  in  general.  It  also  considers  deeper  issues  in  hypothesis  generation,  scenario  and  task  definition, 
experimental  design,  data  analysis,  and  results  presentation  and  communication. 


1.0  THE  NATO  RTO  HFM-138/RTG  COMPUTER  GAME  EXPERIMENT 

Good  c  ommunications  i  s  c  rucial  for  (  possibly  g  eographically  di  stributed)  t  earn  members  c  onducting  a 
complex  m  ilitary  t  ask  su  eh  as  searching  f  or  h  idden  weapons  i  n  an  urban  e  nvironment.  It  i  s  reasonable  t  o 
presume  that  effective  communication  should  be  easiest  for  people  who  share  a  common  culture.  Hence,  the 
NATO  Research  and  Technology  Organization  (RTO)  Human  F actors  and  Medicine  Panel  Research  Task 
Group  on  Adaptability  in  Coalition  T eamwork  (HFM-138/RTG)  conducted  an  experiment  entitled  “L eader 
and  Team  A  daptability  i  n  Mu  ltinational  C  oalitions  (LTAMC)”  to  i  nvestigate  the  p  erformance  o  f  m  ixed- 
versus  homogeneous-culture  military  teams.  Before  we  can  discuss  the  lessons  learned  from  this  experiment, 
we  need  to  briefly  review  its  hypothesis,  methods  and  principal  results.  For  a  more  detailed  treatment,  see 
NATO  RTO  HFM- 13 8/RTG  (2008). 

1.1  Hypothesis  &  Scenario 

The  principal  hypothesis  was  that  teams  whose  members  are  all  from  the  same  nation  perform  better  than 
teams  whose  members  are  from  different  nations.  The  experiment  utilized  a  complex,  but  very  absorbing  and 
immersive,  computer-based  role-play  game  using  a  modem  urban  search-for-contraband  scenario  specifically 
tailored  for  this  NATO  experiment  [Leung,  Diller,  &  Ferguson,  2005;  Warren,  Diller,  Leung,  Ferguson,  & 
Sutton,  200  5]  w  hich  r  equired  p  lanning,  r  esource  allocation,  s  ituation  a  wareness,  c  ommunication,  a  nd 
coordination  for  successful  performance.  Good  performance  also  required  m  aintaining  t  he  good-will  of  the 
local  “populace”  (i.e.,  computer-generated  characters)  who  could  provide  useful  or  misleading  information  to 
the  search  team. 
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1.2  Participants 

The  experiment  involved  56  four-person  teams  (224  military  officers  in  all).  In  48  of  the  teams,  all  four  team 
members  were  from  the  same  nation;  in  8  of  the  teams,  the  four  members  were  from  different  nations.  For 
experimental  de  sign  pur  poses,  we  ha  ve  7  national  groups  o  f  7  t  o  9  teams  eac  h:  B  ulgaria  (8  teams),  The 
Netherlands  (8  teams),  Norway-senior  (8  teams  of  senior  officers),  Norway-junior  (8  teams  of  junior  officers), 
United  States  (7  teams),  Sweden  (9  teams),  and  Mixed  nationality  (8  teams). 

Within  ea  eh  t  earn,  t  he  m  embers  h  ad  to  b  e  n  o  m  ore  t  han  o  ne  r  ank  a  part.  Although  t  here  w  as  no  a  ge 
requirement,  the  rank  constraint  meant  that  team  members  were  of  comparable  ages.  The  computer  game -play 
was  all  in  English  and  all  communication  was  by  keyboard.  Hence,  all  participants  had  to  have  met  a  NATO- 
required  level  of  English  proficiency  and  a  reasonable,  but  unspecified,  level  of  computer  experience.  Post¬ 
play  m  etrics  o  f  E  nglish  p  roficiency  an  d  g  ame  ex  perience  w  ere  d  eveloped  f  rom  r  esponses  to  p  re-game 
questionnaires.  The  resulting  national,  age,  English  proficiency,  and  game  experience  profiles  of  the  56  teams 
are  shown  in  Figure  1. 


Age,  English  Proficiency  &  Game  Experience:  56  Teams 


Figure  1:  Demographic  profiles  of  the  56  teams.  Game  experience  is  proportional  to  bubble  size. 
Letters  indicate  national  composition  of  the  teams:  Bulgaria  (b),  The  Netherlands  (d),  Norway-senior 
age  (n),  Norway-junior  age  (j),  Sweden  (s),  United  States  (u),  mixed  culture  (m). 


1.3  Procedure  &  Metrics 

Each  team  member  was  seated  at  a  computer  terminal.  Same-nation  players  were  visually  and  auditorially 
shielded  from  the  others  at  a  site  in  their  home  nation,  mixed-nation  players  played  in  their  own  nation  over 
the  I  ntemet.  After  two  to  t  hree  hours  of  training  a  nd  a  break,  p  layers  w  ere  briefed  o  n  their  m  ission  a  nd 
engaged  in  a  planning  session  before  actual  game-play  proper.  The  main  team  task  was  to  amass  as  many 
‘goodwill’  points  as  possible.  Points  were  primarily  earned  by  finding  weapons  caches  and  lost  by  angering 
the  local  populace  or  by  opening  empty  crates. 
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Since  game -play  and  communication  was  by  keyboard,  every  keystroke  was  available  for  analysis.  During  the 
game,  there  were  probes  from  a  “superior  officer”  to  determine  situation  awareness  at  three  different  times. 
The  primary  dependent  variable  was  the  amount  of  goodwill  points  earned  by  the  team.  (Since  team  members 
could  specialize  such  that  a  communications  officer  would  not  find  any  caches  and  a  sensor  operator  might 
find  sev  eral,  i  ndividual  s  cores  a  re  m  eaningless.)  Other  p  erformance  m  easures  i  nclude  the  am  ount  of 
communications  and  the  degree  of  situation  awareness. 

1.4  Selected  Results 

Figure  2  shows  the  value  of  each  team  on  the  main  performance  metric  (T-score  of  goodwill  points)  grouped 
by  t  earn  na  tional  c  omposition.  1 1  i  s  c  lear  t  hat  t  he  m  ixed-nation  teams  ar  e  m  ostly  i  n  t  he  u  pper-half  o  f  t  he 
performance  distribution  contrary  to  the  hypothesis. 


Goodwill:  56  T  Scores 


Nos  No.j  Sw 

Team  National  Composition 


Figure  2:  Team  “goodwill”  performance  T-scores  (Mean  =  50;  SD  =  10)  for  each  of  the  56  teams 
grouped  by  national  composition.  Key:  Bulgaria  (Bu),  The  Netherlands  (NL),  Norway-senior  age 
(No.s),  Norway-junior  age  (NO.j),  Sweden  (Sw),  United  States  (US),  Mixed-nation  (Mix). 


2.0  LESSONS  LEARNED 

As  the  previous  section  suggests,  the  NATO  RTO  HFM-138/RTG  experiment  is  conceptually  simple  but  very 
complex  with  respect  to  methodological  aspects  such  as  the  role-play  game  itself,  details  of  the  scenario,  and  a 
team’s  task  and  options.  The  experiment  was  also  complex  logistically  both  within  a  session  and  throughout 
the  entire  experiment.  In  conducting  the  experiment,  we  learned  numerous  lessons  within  the  broad  categories 
of  conception,  the  game  itself,  methodology,  logistics  &  execution,  and  analysis. 
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2.1  Concepts,  Hypotheses,  &  Theoretical  Issues 

This  e  xperiment  us  ed  a  c  omplex  c  omputer  g  ame  t  o  s  tudy  a  daptability  i  n  multinational  c  oalitions.  This  i  s 
appropriate  due  to  the  inherent  and  pronounced  immersive  quality  of  such  games,  but  also  due  to  the  fact  that 
tomorrow’s  military  recruits  are  growing  up  playing  more  and  more  such  games  and  developing  computer  and 
communications  skills  not  typical  of  people  from  a  generation  ago.  Questions  about  what  make  some  teams 
more  effective  than  others  are  difficult  to  answer  in  general,  but  differential  computer  experience  adds  a  fresh 
and  urgent  dimension  to  these  questions  about  team  adaptability  especially  in  multinational  coalitions. 

2.2  The  Game  &  Its  Characteristics 

As  stated  in  Section  1.1,  the  game  we  used  is  based  on  a  complex,  very  absorbing,  and  immersive  role-play 
game,  Neverwinter  Nights.  Using  this  game,  B BN  Technologies  developed  a  general-purpose  research  tool 
termed  SABRE  (Situation  Authorable  Research  Environment)  (Warren  et  al.  2004).  At  the  request  of  NATO 
HFM-138,  Leung,  D  filer,  a  nd  F  erguson  (2005)  also  (Warren  e  t  a  1.  200  5)  developed  a  m  odem  s  earch-for 
contraband  scenario  specifically  tailored  for  this  experiment.  Both  the  general  SABRE  tool  and  the  specific 
LTAMC  scenario  were  extensively  piloted  and  iteratively  refined,  and  we  learned  numerous  lessons  in  the 
development  phase  and  the  execution  phase  of  the  research. 

•  Features  that  permit  creativity,  variant  behaviors:  The  game  and  task  were  chosen  so  as  to  permit  a 
large  d  egree  o  f  cr  eativity  an  d  sel  f-determination  b  y  t  he  t  earns  i  n  how  t  hey  w  ould  a  pproach 
accomplishing  t  heir  m  ission.  B  ut  t  he  m  ore  de  grees  of  f  reedom  g  iven  t  he  t  earns  a  nd  t  he  m  ore 
unstructured  the  task,  the  less  control  that  the  experimenters  have  and  the  harder  it  is  to  interpret  the 
various  results.  It  should  be  noted  that  the  experimental  scenario  was  relative  “static”  in  that  there 
were  no  surprises  or  major  incidents  occurring  during  the  game -play.  The  use  of  non -briefed  events 
could  c  ertainly  be  i  ntroduced  i  nto  t  he  g  ame,  but  wee  hose  nottodos  oto  maintain  a  de  gree  o  f 
comparability  in  the  experiment. 

•  Main  tasks  versus  side  quests:  Although  the  scenario  was  relatively  “static”  as  just  discussed,  there 
were  some  opportunities  for  teams  to  engage  in  “side  quests”  (such  as  helping  a  non-player  character 
computer-generated  girl  search  for  a  lost  pet)  which  could  gamer  goodwill  points  but  which  would 
take  time  away  from  the  main  task.  Such  side-quests  do  add  realism  and  permit  opportunities  for  non¬ 
routine  decision  making. 

•  Experimenter’s  viewpoint  and  used  and  unused  game  features:  From  the  experimenter’s  viewpoint, 
the  game  can  be  very  rich  in  decision  making  opportunities.  However,  a  particular  team  might  decline 
various  opportunities  or  not  be  very  creative  and  thus,  as  the  game  unfolds,  the  game  can  evolve  into 
something  less  rich  because  certain  avenues  are  not  explored. 

•  Player’s  viewpoint:  By  observing  the  players  and  from  their  comments  after  the  experiment,  it  is  clear 
that  the  game  succeeded  in  being  immersive  and  absorbing.  Players  did  not  report  trying  to  figure-out 
what  the  experiment  was  about,  but  rather  quickly  became  fully  engaged  in  the  task  at  hand. 

•  Experimenter  interaction/intervention  possibilities:  The  underlying  game  ( Neverwinter  Nights )  has  a 
“dungeon-master  mode”  feature  in  which  a  g  ame -master  (or  an  experimenter)  can  have  an  invisible 
“avatar”  ( i.e.,  p  ersonal  r  epresentative  eh  aracter  in  t  he  g  ame)  w  hich  can  interact  w  ith  the  g  ame 
environment  an  d  other  ch aracters.  W  e  o  nly  u  sed  t  his  f  eature  o  n  the  r  are  o  ccasions  w  hen  a  human 
player’s  avatar  got  “stuck”  in  a  wall  (there  are  occasional  glitches  since  the  software  is  very  complex) 
to  free  the  avatar  without  the  human’s  awareness.  One  lesson  learned  is  to  be  prepared  for  such  events 
and  to  know  how  to  deal  with  them. 


24-4 


RTO-MP-HFM-142 


Using  a  Computer  Game  for  Research  on  Culture  and 
Team  Adaptability:  Lessons  Learned  from  a  NATO  Experiment 


•  A  related  lesson  for  future  research  is  that  the  dungeon-master  mode  can  be  utilized  to  introduce  some 
player-action-contingent  e  vents  i  nto  the  game-play.  F  or  e  xample,  a  doo  r  could  be  c  losed  (  by  t  he 
unseen  dung  eon  m  aster)  thereby  trapping  the  human  pi  ayer  u  ntil  they  radio  for  rescue  by  another 
player.  Such  in-game  or  in-line  modifications  require  active  monitoring  and  in-game  intervention  by 
an  experimenter,  but  the  possibilities  are  intriguing. 

•  Underlying  &  unused  game  features:  Since  the  underlying  game  permits  many  behaviors  which  are 
not  needed  or  allowed  in  a  particular  scenario  (i.e.,  casting  spells),  it  is  important  toprevent  their 
accidental  use  by,  for  example,  disabling  the  right-mouse  button. 

2.3  Methodology 

The  game  and  LTAMC  scenario  we  used  is  complex  to  learn  and  complex  to  play,  but  the  permitted  behaviors 
are  m  anifold.  This  richness  m  eans  t  hat  certain  methodological  a  spects  t  hat  are  n  ormally  u  nder  an 
experimenter’s  complete  control  in  a  more  traditional  laboratory  experiment  are  not-controlled  or  even  un¬ 
controllable.  Some  methodological  lessons  we  learned  or  special  problems  we  encountered  in  conducting  the 
study  are: 

2.3.1  Participants 

•  Incomparability  of  subject  pools:  When  participants  come  from  multiple  countries,  it  is  very  difficult 
to  be  sure  that  the  subject  pools  are  comparable.  For  example,  a  junior  officer  in  one  country  might  be 
considered  a  student  in  a  second  country  and  hence  not  in  the  pool  of  the  second  country. 

•  Size  of  subject  pool:  In  spite  of  the  size  of  many  militaries,  the  pool  of  available  participants  can  be 
surprisingly  small.  Military  officers,  in  particular,  are  busy  people  and  often  have  critical  jobs  from 
which  they  can  n  ot  b  e  sp  ared  for  a  b  lock  of  4  t  o  6  hour  s.  W  hen  c  onstraints  a  re  placed  o  n  t  he 
characteristics  of  an  entire  team,  such  as  requiring  a  certain  age  range,  the  effective  size  of  the  pool 
can,  and  does,  shrink  drastically. 

•  Representativeness  o  f  pa  rticipants  to  i  ntended  a  pplication:  M  ilitary  o  fficers  h  ave  sp  ecialized 
occupations  a  nd  s  ome  of  these  a  re  no  t  i  nterchangeable.  A  m  edical  of  ficer  c  annot  be  e  xpected  t  o 
perform  the  work  of  a  pilot.  When  the  pool  of  possible  participants  is  small,  allowance  must  be  made 
to  permit  more  people  to  qualify  for  the  experiment.  Unfortunately,  this  means  that  the  relevance  of 
the  results  to  the  target  population  could  become  compromised. 

•  Team  formation:  Within  a  country  and  within  the  same  research  site,  some  individuals  might  know 
each  other  and  some  might  be  strangers.  But  teams  whose  members  have  a  common  past  history  can 
be  expected  to  function  differently  than  teams  whose  members  are  strangers.  A  background  question 
about  prior  knowledge  of  or  experience  with  other  team  members  should  be  included  along  with  the 
demographic  questions. 

•  Distributed  “team”  issues  &  considerations:  When  team  members  come  from  different  geographic 
locations  or  even  nations,  there  are  special  issues  of  team  formation  and  identification  with  the  team. 
This  problem  is  compounded  when  the  only  interaction  team  members  can  have  is  via  a  keyboard. 
But,  however  cumbersome  “introductions”  and  interactions  might  be  among  distributed  teams,  such 
teams  are  becoming  more  and  more  common. 

•  Non-player  c  haracters:  The  t  own  populace  was  c  omprised  o  f  c  omputer-generated  “  non-player 
characters”  (NPCs).  The  avatars  of  the  human  players  could  interact  with  the  NPCs  via  scripted 
question  and  an  swer  sets.  The  N  PCs  w  ere  p  rogrammed  t  o  make  a  variety  o  f  r  esponses  s  uch  a  s 
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providing  tips  regarding  the  whereabouts  of  suspicious  activity.  But  some  NPCs  could  lie  (i.e.,  they 
were  programmed  to  provide  false  information).  NPCs  have  great  potential  in  general  for  research 
purposes.  We  see  this  area  as  needing  more  work,  but  one  which  can  bring  rich  rewards  especially  as 
the  N  PCs  take  o  n  t  heoretically-based  o  r  em  pirically-grounded  personality  a  nd  c  ultural 
characteristics.  The  number,  content,  and  veracity  of  messages  should  be  addressed  by  any  researcher. 

2.3.2  Experimental  Design 

•  As  discussed  above,  the  pool  of  potential  participants  can  be  very  small.  Thus,  it  is  imperative  to  use 
as  efficient  an  experimental  design  as  possible  with  respect  to  the  number  of  necessary  participants. 

•  The  experiment  must  also  be  very  efficient  with  respect  to  its  time  demands.  Six  hours  makes  it  hard 
to  get  participants  and  also  can  be  a  strain  on  the  participants.  The  total  amount  of  time  includes  time 
for  pre-  and  post-game  questionnaires.  These  need  to  be  kept  to  a  minimum. 

•  Statistical  Design,  matched  samples,  controlled  &  uncontrolled  variables:  Another  consequence  of  the 
limited  subject  pool  is  that  there  are  few  possibilities  for  matching  subjects  on  extraneous  variables  or 
for  a  ssigning  subjects  t  o  pr  e-specified  1  evels  in  a  factorial  design  o  n  f  actors  su  ch  as  ag  e.  I  n  a 
companion  pa  per,  W  arren  (  2008)  ha  s  a  rgued  t  hat  full  e  xperimenter  c  ontrol  o  ver  a  11  v  ariables  o  f 
interest  in  a  complex  experiment  is  not  just  difficult  but  actually  impossible.  However,  this  does  not 
mean  t  hat  t  he  e  ffects  o  f  the  confounding  v  ariables  su  ch  as  co  mputer-game  ex  perience,  E  nglish 
proficiency,  o r  o ther  co  variates  cannot  b e  a ssessed.  Using  analysis  of  covariance  (ANCOVA)  and 
other  regression-based  techniques,  these  effects  can  be  measured  and  then  be  statistically  partialled- 
out. 


2.3.3  Procedures 

•  The  use  of  a  computer  game  does  not  obviate  the  use  of  more  traditional  5-  or  7-point  rating  scales. 
We  used  both  pre-  and  post-game  questionnaires  for  obtaining  such  information  as  demographic  data 
and  pe  rsonality  a  nd  c  ultural  p  ro files.  B  ut  another  f  eature  that  recommends  u  se  o  f  the  g  ame  i  s  t  he 
occurrence  of  in-game  probes.  As  mentioned  earlier,  on  three  occasions,  a  “superior  officer”  (wholly 
within  the  game),  probed  the  participants  with  questions  relating  to  their  situation  awareness.  The  use 
of  in-game  probes  can  be  a  powerful  tool  and  is  a  supplement  to  the  out-of-game  questionnaires  and 
the  in-game  situations  (which  are  themselves  tests). 

•  Training:  d  ifferent  1  earning  cu  rves  an  d  t  imes:  T  here  w  ere  two  t  raining  p  hases,  o  ne  in  w  hich 
individuals  1  earned  b  asic  one-person  a  ctions  s  uch  asm  oving  f  orward,  pi  eking  up  ob  j  ects,  u  sing  a 
map,  using  one’s  journal,  etc.,  and  a  second  phase  in  which  an  i  ndividual  1  earned  t  o  communicate 
with  others.  People  were  pennitted  to  complete  basic  individual-action  training  at  their  own  pace.  But 
this  meant  that  people  finished  basic  training  at  different  times.  Fast  learners  often  had  to  wait  a  long 
while  at  an  in-game  waiting  area  while  slower  learners  were  still  mastering  basic  skills.  The  in-game 
waiting  ar  ea  h  ad  a  musing  a  ctivities  t  o  k  eep  p  eople  bus  y,  but  i  t  could  be  a  long  t  ime,  a  nd  t  he 
amusement  nature  of  the  filler  activities  could  contribute  to  a  sense  that  the  overall  game  was  not  a 
serious  exercise. 

•  Training:  proficiency  criteria  and  removal  concerns:  Related  to  the  problem  of  different  people  taking 
time  tor  each  a  su  fficient  1  evel  o  f  p  roficiency  i  s  t  he  q  uestion  at  w  hat  1  evel  t  o  set  t  he  p  roficiency 
criteria.  Although  this  never  occurred  in  the  main  experiment,  we  did  have  a  case  during  piloting  with 
non-military  participants  when  one  individual  simply  could  not  achieve  sufficient  skill  to  enable  that 
experimental  run  to  continue.  Since  this  occurred  during  piloting,  no  time  limit  had  been  set,  and  this 
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led  to  a  boredom  problem  with  the  other  three  players.  Of  course,  not  only  do  such  aborted  sessions 
waste  peoples’  time,  it  can  be  costly  in  terms  of  money  since  (non-military)  participants  still  have  to 
be  paid. 

•  Local  t  esting  i  ssues:  b  reaks  et  c.  When  t  esting  w  as  a  t  o  ne  s  ite,  t  he  p  rocedure  was  t  o  co  nduct  p  re¬ 
questionnaire  completion,  individual,  and  team  testing  phases  before  lunch.  The  planning  and  search 
phases  were  after  lunch,  but  this  raises  the  chances  that  some  forgetting  might  take  place.  We  n  ow 
recommend  that  a  short  “refresher”  training  session  occur  after  lunch. 

•  Distributed  testing:  time  zones  consideration:  The  mixed-nation  testing  was  done  over  the  Internet. 
But  s  ince  t  he  e  xperiment  s  panned  6  time  z  ones  a  nd  c  ould  take  6  hou  rs,  t  he  e  xperiment  began 
relatively  early  in  the  morning  for  the  Americans  and  ended  relatively  late  at  night  for  the  Europeans. 
The  previous  point’s  reference  to  “lunch”  has  to  be  modified,  but  the  issue  of  the  timing  of  breaks 
becomes  e  ven  more  i  mportant.  A  nything  t  hat  1  engthens  t  he  e  xperiment,  s  uch  a  s  t  he  a  bove 
recommendation  for  a  “r  effesher”  training  phase,  must  be  carefully  weighed  against  the  effect  of  a 
long  day  on  some  people’s  performance. 


2.4  Administration  &  Logistics 

•  Subject  scheduling  issues:  As  discussed  above,  the  size  of  the  pool  of  possible  participants  was 
severely  1  imited.  O  ne  a  dministrative  d  ifficulty  th  at  r  esulted  from  th  is  w  as  that  o  f  b  eing  a  ble  to 
schedule  at  least  four  people  for  a  test  day.  It  often  took  considerable  effort  on  the  part  of  the  research 
team  to  locate  and  enlist  the  minimum  of  four  people  needed  for  a  team. 

•  The  difficulties  were  great  enough  that  there  were  times  when  a  session  had  to  be  canceled  in  advance 
due  to  either  the  inability  to  locate  four  participants  or  due  to  the  advance  cancellation  by  one  of  the 
volunteers.  This  again  put  a  burden  on  the  research  team  to  contact  the  remaining  volunteers. 

•  Even  on  days  when  four  people  had  been  scheduled,  there  was  the  all  too  common  and  exacerbating 
problem  of  a  scheduled  volunteer  not  appearing  and  thus  forcing  the  cancellation  of  the  session  and 
the  attendant  loss  of  time  of  those  who  did  appear  for  the  experiment. 

•  One  t  echnique  for  dealing  with  t  he  pr  oblem  of  “no-shows”  is  to  schedule  more  people  than 
required.  Due  to  our  limited  participant  pool,  this  option  was  difficult  to  exercise. 

•  Even  if  we  had  a  large  pool  and  could  routinely  “overbook”  participants,  overbooking  does  not 
guarantee  that  the  required  number  of  participants  will  show  up.  The  reality  of  research  on  teams 
is  that  no-shows  are  all  too  common:  If  6  pe  ople  are  scheduled  for  a  four-person  session,  only 
three  might  show  up. 

•  But  “overbooking”  has  is  own  problems.  One  problem  is  that  if  all  show  up  for  the  experiment, 
some  method  has  to  be  used  to  determine  whom  to  dismiss  and  in  such  a  way  that  the  excused 
person  is  treated  with  respect  and  made  to  feel  that  their  effort  is  still  appreciated  and  not  wasted. 

•  In  research  without  the  need  for  a  participants  with  highly  specialized  characteristics,  one  way  to 
not  waste  any  “unusable”  participants  who  report  for  an  experiment  (either  too  few  or  too  many) 
is  t  o  have  alternate  1  ower-priority  e  xperiments  r  eady  w  hich  c  an  u  se  w  hatever  num  ber  o  f 
participants  are  available  after  due  consideration  for  the  needs  of  the  highest  priority  experiment. 
However,  this  was  not  an  option  for  us  due  to  the  small  size  o  f  the  p  ool  o f  p articipants.  Any 
potential  participants  who  could  not  be  run  even  after  they  reported  for  the  experiment  needed  to 
be  asked  to  reschedule  if  at  all  possible. 
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•  Scheduling  a  1  ong  e  xperiment  ov  er  a  n  oc  ean:  The  mixed  t  earn  por  tion  of  the  e  xperiment  of  ten 
required  having  an  American  and  Bulgarian  on  the  same  team.  Arranging  for  a  short  meeting  across  5 
or  more  time  zones  is  hard,  but  arranging  a  an  experimental  session  that  will  take  six  or  more  hours 
means  that  the  Europeans  will  be  finishing  quite  late  in  their  day  and  that  the  Americans  will  be 
starting  quite  early  in  their  day.  The  definition  of  “lunch”  break  is  thus  relative  and  has  to  be  taken 
into  account  when  the  potential  participants  are  given  details  about  what  is  being  asked  of  them  when 
they  are  solicited. 

•  Computer  operators  and  local  administrators:  The  above  remarks  about  long  experiments  across  an 
ocean  also  apply  to  the  local  computer  operators  and  local  experiment  administrators  who,  by  the 
nature  of  their  responsibilities,  must  be  present  both  before  and  after  the  participant  session. 

•  Internet  ope  rators:  The  mixed  t  earn  por  tion  of  thee  xperiment  a  Iso  r  equired  t  he  us  eo  fa 
knowledgeable  team  of  SABRE  experts  and  an  internet  operations  center  to  “host”  and  coordinate  the 
multi-site  internet  portion  of  the  experiment.  In  order  to  ensure  smooth  operations  and  prevent  loss  of 
precious  data,  the  internet  operations  had  to  be  flawless.  This  required  much  advance  preparation  and 
testing  o  f  c  ommunication  links  a  nd  pr  ocedures.  A  lthough  g  iven  s  cant  m  ention  in  the  e  xperimental 
write-up  and  methods  sections  of  the  reports,  this  aspect  of  the  experiment  is  crucial  and  required 
considerable  effort. 

2.5  Data  Collection,  Processing  &  Analysis 

The  S  ABRE  t  estbed  features  automatic  d  ata  c  ollection  of  bo  th  p  re-game  que  stionnaires  a  nd  w  ithin-game 
activity  and  communications.  SABRE  also  collates  the  data  from  the  various  individual  team  sessions  and 
collates  the  data  into  large  spreadsheet  files  for  post-processing  by  various  statistical  packages. 

•  Although  SABRE  does  provide  some  basic  statistics,  it  was  felt  best  to  leave  the  main  analyses  to  the 
various  members  of  the  experimental  teams  and  the  statistical  p  ackages  they  prefer.  One  reason  for 
this  is  the  large  and  diverse  nature  of  the  datarecorded  and  the  subsequent  opportunities  for  post¬ 
experiment  data  mining.  W e  believe  that  the  datasets  resulting  from  this  experiment  will  yield  rich 
treasures  as  we  continue  to  mine  them. 

•  With  a  data  set  resulting  from  the  game-play  and  questionnaires  of  224  participants,  it  is  invariable 
that  there  will  be  some  missing  data.  Since  different  analysts  have  different  preferences  for  dealing 
with  missing  data,  it  is  imperative  that  there  be  tight  configuration  management  of  the  raw  and  early- 
processed  data  sets  that  are  distributed  to  the  various  analysts.  In  turn,  it  is  also  important  that  the 
various  an  alysts  m  aintain  t  heir  o  wn  p  rocessed-data  f  ile  c  onfiguration  m  anagement  w  ith  f  ull 
description  of  the  decisions  they  made  and  the  procedures  they  followed. 

2.6  Drawing  Conclusions  &  Making  Recommendation 

In  spite  of  running  224  participants,  the  resulting  number  of  four-person  teams  was  56. 

•  Since  our  analyses  are  all  team-centric,  the  conclusions  are  based  on  the  relatively  small  number  of  56 
teams.  As  such,  statistical  power  is  weak  and  the  conclusions  must  be  taken  with  caution. 

•  Also,  as  discussed  b  y  Warren  (2008),  there  are  s  everal  confounds  that  also  serve  to  t  emper  our 
conclusions  and  recommendation  such  participant  differences  in  age,  computer-game  experience,  and 
English  proficiency. 

•  However,  t  he  c  onfounds  are,  to  a  1  arge  de  gree,  u  navoidable  due  to  t  he  c  omplex  na  ture  of  t  he 
participant  populations.  They  are  not  deficiencies  in  the  experimental  design.  Fortunately,  there  are 
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statistical  techniques  such  as  ANCOVA  and  linear  regression  which  can  “partial  out”  the  effects  of 
the  confounds  and  enable  the  drawing  of  confound-free  conclusions. 


3.0  FINAL  COMMENTS  &  GENERAL  QUESTIONS 

We  h  ave  p  artial  answers  tow  hat  m  akes  so  me  t  earns  p  erform  w  ell  an  d  o  thers  not  sow  ell.  B  ut,  i  n  g  eneral, 
much  of  what  makes  a  team  adaptable  in  a  multinational  coalition  is  still  not  fully  understood.  However,  we 
believe  we  have  demonstrated  the  value  of  using  an  immersive  computer  game  to  provide  rich  data  sets  to 
help  provide  such  answers.  As  tomorrow’s  military  recruits  become  more  and  more  experienced  with  complex 
immersive  computer  games  than  the  recruits  of  yesterday,  it  becomes  imperative  that  we  study  the  possible 
impact  of  such  experience  on  selection  and  training  for  tomorrow’s  more  computer-reliant  military. 
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